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ABSTRACT 

The need ror common measures in research on teaching 
is legend, and the merits of teaching performance tests to meet this 
requirement are explored here* A regression study where teacher 
performance tests were used as dependent measures is described* 
Sixth-four subjects were givnn objective- based lessons to teach* 
During their lesson, they were rated on the use of six instructional 
principles* Following instruction, learners were administered a short 
test of* achievement and interest* Step-wise regression analyses were 
conducted, and variables related to the performance criteria 
described* Suggested modifications of performance tests to enhance 
their suitability as dependent measures are discussed* (Author) 
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After the clattering controversy, the claims and counterclaims 
which have enveloped the question of teaching performance tests, this 
paper was intended to provide a crystalline moment of unity. Whereas 
teaching performance tests have been characterized and challenged as 
useful in teaching assessment, teaching analysis and teaching improvement, 
the purpose of this paper was to discuss such tests in an important and 
blessedly undisputed role: as dependent measures for instructional experi- 
ments involving teacher behavior.. The assumed agreement among us is based 
on three factors: 1) there is a continuing need for reasonable dependent 
measures for instructional experiments, especially ones which can be used 
under a wide number of conditions and might conceivably function as stan- 
dard rcferenus for evaluating the sensitivity of experiments; 2) performance 
tests are logically related to significant aspects of the instructional task; 
3) the tests are *'ef fectiveness-based" so scores on them are suitable fo/ 
the desired outcome nature o£ a dependent variable. 

This paper intended to sanctify the use of such tests with a neat 
and unassailable example cf performance tests functioning as anticipated. 
Unfortunately, thi' felicitous rite must be deferred until the future and 
tenor of the paper shifted to one of faltering exploration rather than 



resounding demons tr^t Ion « 

In li.eu of conciucting a convocatlcnnl exercise, I shall rather 
provide the details of a research study where performance tests were used 
as the dependent measure, present the obtained data analyses and discuss 
some of the difficulties, insurmountable and otherwise, inhibiting un- * 
wavering confidence in the use of performance tests as possible dependent 
measures. The study is presented as a reference for discussion, the 
particular finding should not be considered to be the central topic of 
this presentation. 

Overview : 

Data were obtained for this study from sixty- four teacher education 
candidates, each asked to teach' short performance tests to their pe.ers as 
part of the instructional requirements for a course in curriculum and 
instruction. During these lessons the students were rated by eight trained 
observers according to their use of six instructlcnal techniques. Correla- 
tion and step-wise regression analyses were planned to be conducted on the 
data using achievement scores and interest ratings of students as dependent 
measures. 

Performance tests : - . . 

Eleven different performance tests were used in the study. These <^ 
were assigned at random within each group of subjects. The reason for the 
great number was to insure that the task would be new to each .subject. 
These test:^ each consisted of a statement of an operationally defined 
Instructional objective, a sample test item, and approximately two pages 
of relevant content on the topic covered by the objective. Students taught 
by teachers were asked to coriipletc a short posttest following the fifteen 



minute instructional period and v;ere alf^o asked to rate the lesson in 
interest on a five point ucaJ.e* Topics for the tejsts were concepts 
relevant to the course, but likely not to have been encountered by these 
subjects, e.g., erosion measures in educational evaluation. Behaviors 
called for by the objectives were either discrimination of examples of * 
concepts, e.g^, "is X an example of A?" or classification, 'Vhich of the 
following four procedureFTs*^ an instance of?" 

• 

Subjects ; 

Sixty-four senior and graduate students preparing to be secondary 
level teachers were Involved in the study. To be admissible for teacher 
education work these students normally need to have a 3.0 grade point 
average at UCLA or a comparable institution. These subjects were randomly 
assigned to 'kinllesson" groups. 

Raters ; 

Eight different raters were involved in tho observation phase of the 
study. These students were, for the most part, teacher education candidates 
who were exceptionally successful in completing the curriculum and instruc- 
tional course. Operating on a Keller (1968) model, these students received 
course credit for acting as an assistant/leader to one or two groups of 
from five to eight students. Tlie raters, by virtue of their selection, . 
had already demonstrated capability in the analysis of instruction according 
to the techniques for observation. A two-hour tr<jining session, followed 
by weekly reviews for three weeks was administered to each rater. 

Inst ructional Techniques ; 

Six instructional techniques were to be observed and considered as 



the independent vnrlablos in thhi iuvcctigniilon^ rucsc techniques can be 

exhibited in teacher beliavior and bc-ive been demonstrated to be reliably 

judged (Baker, 1969). Ine tecluiiques observed v:ere the follov/ing: 

Direct Practice: Did the teacher prov'^idc opportunity for the 

clas<; to practice the criterion behavior 
described in the objective? 

Knowledge of Did the teacher inform the students whether 

Results: their responses were adequate? 



Prompt lYig.*^ Did the teacher provide cues to allow responses 

to be more eosilj' made at the outset of insttruc* 
tion and then reduce the students' dependency 
upon cues? 

Individualization: Did the teacher respond to the individual 

attribute's and experiences of the students 
by varying instruction for certain individuals? 

Task Description: Did the teacher communicate in unambiguous 

language what task the learners should focus 
upon? 

Motivation: Did the teachers attempt to provide incentives 

or explanations designed to enhance the appeal 

of attending to instruction? 
» 

Justification for the selection of these variables can be found in almost 
any consideration of the literature in instruction. Full descriptions of 
techniques 1, 2, 3, and 5 are presented in a recently prepared set of 
materials (Baker and Quellmalz, 1972). 

Procedures : 

All enrolled students in the Curriculum and Instruction class were 
to complete at least one 'Vninilesson" during the course of the quarter. 
These lessons were assigned at random within groups one week in advance 
to each student, prior to the scheduled conduct of the lesson and the 
administration of measures. During each lesson, raters surreptitiously 
completed brief rating forms where the "teachers'" use of six principles 
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was £5.9scnr?e<:U 'Ihcr^c. formu vrycc cor.cc:! c:o ii\ i\ r.oi:ci>ook, and i:he rater snf. 
off to the side of r.iic instructor, no tiuintion or diucii^isiou of the dimcn- 
Glons recorded on the foru* foUovred cny lesson. FoUoviug the conclusion 
of each fifteen minute le.^i:;>n, students were- permitted to complete the 
posttcst and the iuterei^t rj^tivig fo7:m. Tnere vjatJ no time Hunt imposed in 
the testing. Data were collected over a three week period. 



Data Analygis ; 

Average scores for the students iiT'^^ 
in percentage terras for each teacher. Average rating frosn 1 to 5 on interest 
was also computed. 

Analyses were first conducted on the raw data, that is, the average 
percentage of achievement and average interest rating for each teacher. 
Means and standard deviations by test for cognitive and interest rating 
are presented below In Table 1, 

Table 1, Means and Standard Deviations by Test 
Forms for Achievement and Interest 



Test 



Achievement 

X s 



Interest 
X 8 



•1 


84.69 


19.38 


2 


79.00 


14.18 


3 


81.67 


10.36 


4 


• 74.00 


0.00 


5 


. 71.00 


0.00 


6 


100.00 


- - 0. 00 


7 


80.67 


11.85 


8 


56.67 


11.37 


' 9 


61.33 


10.07 


10 


86.00 


2.16 


11 


85.00 


0.00 


XX 

(uncoded) 


.76.00 


10. 30 


TOTAL 


79.55 


15.14 



1.90 
2.02 
1.93 
2.60 
1.90 
2.30 
2.10 
2;20 
2.40 
1.70 
2.00 

2.12 

2.00 



.74 
.32 
.58 
.00 
.00 
.00 
.40 
.70 
.17 

.49 
.00 

.13 

.61 



n 



16 
14 
12 
1 
1 
1 

3 
3 
3 
4 
1 

5 

64 



From the distribution of scores one can see that the idea thct the teats 
would be approximately eqiuilly distributed among the student didn't work 
out. Explanations for the disproportionate assignment of topics is 
related to absenteeism in ihe rainilesson sessions. In all, subsequent 
analyses, data were considered for subjects in Tests i, 2, and 3, where 
the distribution of subjects was most equitable (N = 16, 14, 12 respec- 
tively). 

Observation datFv;ere a^tiragbd tor ail subjects^ fiSahs'and standard 
deviations of ratings are presented in Table 2^ 



Table 2. Means and Standard Deviation^* of Ratings 
of Six Instructional Techniques 



Variable 


X 


s 


n 


Practice 


. 3.06 


1.33 


64 


Knowledge of 
Results f 


3.06 


1.A4 




Prompting 


2.17 


1.27 




Individualization 


2.30 


1.28 




Task Description 


3.31 


.88 




Motivation 


2.68 


1.27 





Table 3. Means and Standard Deviations of Ratings of Six 
Instructional Techniques. (Tests 1, 2, 3 only) 



Variable 


X 


8 


Practice 


• 3.00 


1.41 


Knowledge of 
Results 


2.93 


1.5:^ 


Prompting 


2.19 


1.37 


Individualization 


2.38 


1.43 


Task Description 


3.26 


0.91 


Motivation 


2.81 


1.29 



Correlation coefficients were computed for the variables: and the dependent 
measures and are reported in Table 4. 



Tne intcrroln'i:ioni;5:ip ^unong tl**o ladei^ondonL varir.bler* is obvious. 

Pr«nctica v;a.'5 fouad to correlate: rd.^.yui.f i.cariCly (beyond .01 level) 
with each of the indepciKlont; vcriablo:;, vrllh the exception of motivation. 
Tlie pattern of relr,tion<ihip for the variable^ Knov;ledge of Rei^ults (as 
expected for the .94 correlation), was identical to that of Practice. 
Prompting correlated significantly x^ith each of the other independent 
measures. Individualiasation vas not found to correlate only Task Descrip- 
tion. Motivation fiigni ffranfly-^r^ni^yy^n^^^w^i^ft Pro mpfipg ^, - ^J^^ Description^ 
and Individualization* 

Data from both dependent variables, achievement and interest » were 
transformed into standard scores (X » 50, s « 10) within each test and 
them combined. 

Correlations of the independent variables with both raw and trans- 
formed scores for the dependent measures are presented below« 

Table 5* Correlation of Independent 
and Dependent Variables 



Achievement Interest 
Raw Transformed Raw Trans formed 



Practice 


.28* 


.28* 


.36** 


.34* 


Knowledge of 
Results 


.18 


.19 


.37** 


.35* 


Prompting 


.26* 


.30* 


.15 


.23 


Individualization 


.29* 


.29* 


.09 


.18 


Task Description 


.07 


.02 


.04 


.00 


Motivation 


-.05 


-.02 


.30* 


.34* 



* P 4 .05 
** p < .01 

N ■'.42, Tests 1, 2, 3 only 
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Equating the tests by transfonnlng scoren resulted in little change in 
correlational values* To suinmnrize, the variables found to be most 
related to achievement were the observed use of Practice, Prompting and 
Individualization, Variables significantly related to interest ratings 
by students v;ere Practice, Knowledge of Results and Motivation. A previous 
study (Baker, \ 1969) using a sample of 80 teachers and 20 different per- 
formance tests obtained significant correlations, of about the same 
magnitude fqr_tj^^ of Practice, Individualization and Knowledge 

of Results* 

Step-wise regression analyses were conducted as intended and are 
described in the Appendix* However, because of the intercorrelations among 
the predictor's conclusions from these analyses are tenuous at best* For 
a brief resume, please see the Appendix section* 

Discussion of the Study 

Certain variables were found to relate sigtiificantly to achievement 
and ratings of interest. Because of the high correlation of practice and 
' knowledge of results, perhaps only one such variable should be observed, 
in future investigation, probably practice, because it was found to be 
related positively to both achievement and interest dimensions* 

Exploring Problems with the Dependent Measure 
The multiple performance test approach 

If one were in a position to use a single performance test as a 
dependent measure, problems would be both dissipi^ted and created. The use 
of a single performance test would have precluded the necessity to trans- 
form scores, but would reduce the generalizability (if any) of the obtained 
results* Transformation is required, not to weight the contribution of each 



test but to neutr^jlize differences In the difficulty of each test. 

Multiple tests also raise the issue of possible perforraance test/ 
independent variable interactions, in that a given tost might be more 
conducive to the use of certain techniques than others. 

The question of reliability 

Jason Millman will report his work on the psychometric properties 
of performance tests. Clearly stability-coefficients (test^-reriiesiL-Qf^^^ 
teachers) would be important to obtain. The present study was designed 
so that each subject be taught only once and thus the design precludes 
such analysis. Reliability analyses were computed for Tests 1, 2, 3 where 
16, 14, 12 teachers were Involved. In Table 6 below, these findings are 
summarised. 





Table 6. Achievement Reliability 
Performance Tests 1, 2, 


on Test 
and 3. 


Items for 




Test 


No. of items 




8 


alpha 


N 


1 


10 


8.97 


1.67 


.78 


78 


2 


5 


3.76 


1.27 


.60 . 


71 


3 


10 


8.07 


1.41 


.45 


. 60 



Because performance test items are designed to be homogeneous measures of 
the objective, difficulty persists in interpreting such data, particularly 
in the light of the pursuit of substitute reliability*-determlnation procedures 
for objective-referenced tests. Especially, it is not clear that one can gen- 
eralize the' results of analyses performed on instructor groups with a vide 
distribution of teaching talent (as measured by performance) to groups of 
instructors trained to behave more homogeneously and to produce more similar results. 

- 10 - 



Plffe reaces in Requlrcinentg for Perform ance Tests 
for Evaluation and Research Purposes 

The inforraal banter in educational circles alleges that instructional 
research should be carefully done with enormous attention to detail, and in 
contrast, evaluation studies may be permitted to proceed with 
a much more casual view regarding design, controls, and the other catch- 
words of the educational research community. The performance test notion 
provides an example where, I would suggest, the practiced precision require- 
ments must be reversed. Wlien performance tests are to be used for decision 
purposes, such as the evaluation or selection of teachers, one would need, 
on ethical grounds, a clear concern with the consistency and validity of the 
experience for the individuals tested. If Instructional Improvemsint programs, 
or In some cases, career chances are modified at all by the use of such 
measures, one would wish to have heightened confidence in the basis for 
the decision. On the other hand, and at the risk of sounding wild and 
contemptuous of standards of research rigor, constraints on the use of 
performance tests to investigate the relationship of Independent variables 
might be somewhat relaxed. For Instance, If the tests have imperfect 
reliability coefficients, in light of imperfect methodology, the research 
ethos is to report the data, qualify one*8 conclusions and encourage repli- 
cation. The trained consumer of educational research takes his or her own 
risks. Each should be able to evaluate the validity of an investigation, 
particularly if the designer of the study Is careful to disclaim proving 
. anything and clearly indicates the limitation of the work. At best^ the 
study will be replicated and inadequacies learned first-hand. At worst, 
someone might design and conduct an extension of the work and waste some \ 
effort if the original study were not carefully reported or understood. | 
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Tlie control of personal destiny and the interpretation of knowledge 
is in the hands of researchers , who presiinnbly subscribe to a minimuiu 
set of standards in design and interpretation of empirical work. On the 
other hand, teachers v/ho are evaluated by use of the performance test and 
for whom decisions may have serious consequences are not in positions of 
controlo Moreover, it is likely, that the educational p iv con- 

ducting the evaluation, a school principal, for example, may not be 
sensitive to the inadequacies of the data. Thus, I would hope to see 
performance tests developed to a high level of precision when they are 
to be used for personnel decisions and to permit less well-refined tests 
to function in an exploratory role in instructional research. 

Other contrasts between the attributes of tests for evaluation and 
research purposes may be* drawn. For instance, one would expect that per- 
ceived relevance to the task of teaching would be at a higher premium in 
evaluation rather than research problems. Similarly, the need to permit 
adequate preparation time would vary. Certainly, in instructional research, 
a dependent measure with little variability is not desirable. In contrast, 
performance tests Tor teacher evaluation might not require as much variability 
and could be used to identify only the aberrant individual, one who has 
been given time and assistance, and still is unable to demonstrate influence 
over the outcomes of instruction. 

Performance Tests as a Technology 

Performance tests represent the beginning of a technology, and if the 
history of technology at large is repeated, they will not be used only or 
primarily for the purpose for which they were originally intended. The 
original experimenters with laser technology had some particular purposes 

• 12 - 



in mind; the present use of lasers is wide- ranging and continues to be 
explorative, Scotch tape was developed for a given ourpose, but as users 
of the ■ tct t^xperimented , the invention gained expanded functions. 
New uses wext suggested and the technology was explored as changes in the 
product were made, so that scotch tape has a range of utility, from to 
providing a writing surface on inhospitable exteriors to pasting do\m 
durls on cheeks of teen-age girls. 

The performance test may become an effective tool if it is consi- 
dered as an invention, to be tested against various uses and fruitful 
modifications and not prematurely ossified. If the performance test 
proves adaptive to the broad requirements of the field, its utility as 
a dependent measure might be only one of its important contributions. 
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Appendix 



Analysis of the data was pursued, using step-wise regression, and 
data are presented in Tables A and B from transformed scores. 



Table A, Step-wise Regression Summary for 
Transformed Achievement Scores 



Variable 


Multiple R 


F 


Prompting. 


.307 




Practice 


.345 


5.062 


Knowledge of 
Results 


.438 


4.128 


Task Description 


.475 


.784 


Individualization 


.492 ' 


.952 


Motivation 


.497 


.208 



N « 42 



Table B. Step-wise Regression Summary for 
Transformed Interest Ratings 



Variable 


Multiple R 


F 


Knowledge of 
Results 


.351 




Motivation 


.452 


7.164 


Task Description 


.502 


3.91 


Practice 


.528 


1.388 


Individua 1 i za t i on 


.539 


.579 



Because of the correlation among predictor variables, analysis were re-run 
for both sets of data. For the transformed achievement dependent measure. 
Knowledge of Results was dropped from the set of variables. The variables 
identified in the analyses were Prompting, Practice and Task Description. 



For the transformed interest ratings, Practice was deleted, again because of 
the ,94 correlation obtained with Knowledge of Results, Tlie order of the 
predictors was as follows: Knowledge of Results, Motivation, Task Descrip- 
tion, and Individualization, 



Table C, Summary of Step-wise Regression with Transformed 
Achievement Scores Deleting Knowledge of Results 



Variable 


Multiple R 


F 


Prompting 


.307 


.914 


Practice 


.345 


.862 


Task Description 


.370 


.251 


Motivation 


.381 


.579 


Znd iv idua 1 iza 1 1 on 


.398 


.544 



Table D. Summary of Step-wise Regression with Transformed 
Interest Scores Dsl ettng Practice 

Variable Multiple R f 

Knowledge of Results ,351 

Motivation .452 6.020 

Task Description .503 2.763 

Individualization .513 , 0.526 
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